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Abstract 

Online  services  are  typically  replicated  on  multiple 
servers  in  different  datacenters,  and  have  (at  best)  a 
loose  association  with  specific  end-hosts  or  locations. 
To  meet  the  needs  of  these  online  services,  we  intro¬ 
duce  SCAFFOLD — an  architecture  that  provides  flow- 
based  anycast  with  (possibly  moving)  service  instances. 
SCAFFOLD  allows  addresses  to  change  as  end-points 
move,  in  order  to  retain  the  scalability  advantages  of  hier¬ 
archical  addressing.  Successive  refinement  in  resolving 
service  names  limits  the  scope  of  churn  to  ensure  scala¬ 
bility,  while  in-band  signaling  of  new  addresses  supports 
seamless  communication  as  end-points  move. 

We  design,  build,  and  evaluate  a  SCAFFOLD  proto¬ 
type  that  includes  an  end-host  network  stack  (built  as 
extensions  to  Linux  and  the  BSD  socket  API)  and  a  net¬ 
work  infrastructure  (built  on  top  of  OpenFlow  and  NOX). 
We  demonstrate  several  applications,  including  a  cluster 
of  web  servers,  partitioned  memcached  servers,  and  mi¬ 
grating  virtual  machines,  running  on  SCAFFOLD. 

1  Introduction 

The  Internet  is  increasingly  a  platform  for  online 
services — such  as  search  engines,  social  networks,  and 
content  delivery — that  are  replicated  on  servers  in  differ¬ 
ent  locations.  These  services  undergo  significant  churn 
due  to  failures,  planned  maintenance,  client  mobility, 
workload  migration,  and  so  on.  In  this  paper,  we  present 
SCAFFOLD,  an  architecture  that  meets  the  needs  of 
these  services  by  supporting  flow-based  anycast  with 
(possibly  moving)  services  instances.  To  support  this 
communication  abstraction,  we  rethink  the  relationship 
between  the  network  and  the  end-host  stack,  to  simplify 
the  design  and  management  of  online  services. 

1.1  Service  Replication  and  Dynamics 

A  service  is  a  group  of  processes  offering  the  same  func¬ 
tionality  interchangeably  ( e.g .,  a  client-facing  web  server 
in  a  replicated  tier).  Services  face  two  major  challenges: 

Replication.  Services  run  on  multiple  servers,  num¬ 
bering  from  a  few  to  the  hundreds  of  thousands,  stretch¬ 


ing  from  local-area  clusters  to  multiple  datacenters. 
Rather  than  host-based  unicast  communication,  we  ar¬ 
gue  that  the  main  communication  abstraction  should  be 
service-based  anycast,  where  each  client  binds  to  a  par¬ 
ticular  instance  of  a  named  service. 

Principle:  The  network  should  enable  communication 
with  a  sendee  group,  with  flow-based  anycast  which 
supports  stateful  connections  to  replica  instances. 

SCAFFOLD’S  anycast  primitive  can  direct  individual 
datagrams  to  different  replicas,  while  ensuring  that  pack¬ 
ets  of  the  same  flow  reach  the  same  (possibly  moving) 
service  instance — a  property  that  we  refer  to  as  flow  affin¬ 
ity.  Furthermore,  each  packet  includes  a  service-level 
identifier  (or  servicelD )  that  represents  an  application- 
level  service  rather  than  a  host.  Thus,  the  network 
can  forward  traffic  and  allocate  resources  based  on  the 
higher-level  abstraction  of  a  service.  In  contrast,  today’s 
IP  packets  contain  only  an  end-point  address. 

Dynamism.  Modern  services  operate  in  a  dynamic  en¬ 
vironment,  where  a  replica  may  fail,  undergo  mainte¬ 
nance,  migrate  to  a  new  location,  seek  to  offload  work, 
or  be  powered  down  to  save  energy;  new  replicas  may 
be  added  to  handle  extra  load  or  tolerate  faults.  This 
dynamism  stretches  across  many  levels  of  granularity — 
from  connections,  to  virtual  machines  and  physical  hosts, 
to  entire  datacenters.  Rather  than  hosts  retaining  their 
addresses  as  they  move,  SCAFFOLD  allows  end-point 
addresses  to  change  dynamically.  This  allows  networks 
to  apply  whatever  hierarchical  addressing  scheme  they 
wish  for  more  scalable  routing,  and  enables  hosts  to  mi¬ 
grate  across  layer-two  boundaries. 

Principle:  The  network  addresses  associated  with  a 
sendee  should  be  able  to  change  over  time  as  sendee 
instances  fail,  recover,  or  move. 

When  an  end-point  moves,  SCAFFOLD  performs  in- 
band  signaling  to  update  the  remote  end-points  of  estab¬ 
lished  flows.  When  a  service  instance  fails,  recovers,  or 
moves,  the  network  automatically  directs  new  requests 
to  the  new  location.  In  contrast,  today’s  network  cannot 
easily  allow  end-point  addresses  to  change  because  these 
addresses  are  exposed  to  (and  cached  by)  applications. 
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1.2  Service- Centric  Network  Architecture 

The  main  research  contribution  of  SCAFFOLD  is  a  “nar¬ 
row  waist”  of  network  support  for  flow-based  anycast 
with  dynamically-changing  service  instances.  As  an  ar¬ 
chitecture  for  communication  with  services  rather  than 
devices,  the  “narrow  waist”  of  SCAFFOLD  includes 
functionality  normally  considered  part  of  the  transport 
layer,  along  with  traditional  network-layer  functions. 
In  particular,  SCAFFOLD  provides  (i)  late  binding  to 
instances  through  successive  refinement  of  the  service 
identifier  to  maximize  flexibility  and  contain  churn  (re¬ 
alizing  our  first  principle),  and  (ii)  automatic  adaptation 
to  service  dynamics  through  tight  integration  of  the  end- 
host  network  stack  with  the  network  (realizing  our  sec¬ 
ond  principle).  Our  solution  has  three  main  components: 

Packet  headers  (servicelDs  and  network  ad¬ 
dresses):  SCAFFOLD  packets  include  both  the  service 
identifiers  and  the  network  addresses  of  communicating 
end-points.  The  network  uses  the  destination  servicelD 
to  direct  a  new  flow  to  an  instance  of  the  named  service 
(anycast),  while  the  network  addresses  ensure  continued 
communication  with  that  instance  (flow  affinity).  Serv¬ 
icelDs  are  also  used  to  remap  a  flow  after  a  failure  and 
support  service-based  QoS  in  the  network. 

Network  elements  (service  and  network  routers): 
SCAFFOLD  consists  of  service  routers  that  direct  a 
packet  to  a  service  instance  based  on  the  servicelD,  and 
network  routers  that  forward  packets  based  on  destina¬ 
tion  addresses.  Service  routers  handle  the  first  packet 
of  each  flow,  while  the  network  routers  directly  forward 
the  remaining  packets.  Network  routers  do  not  keep  per- 
service  state,  and  neither  keep  per-flow  state,  allowing 
network  elements  to  scale  to  many  flows  and  services. 

End-hosts  (network  stack  and  API):  In  SCAF¬ 
FOLD,  applications  bind  or  connect  only  to  servicelDs, 
so  addresses  can  freely  change  as  an  end-point  moves. 
When  an  application  binds  (or  closes)  a  socket,  the  net¬ 
work  stack  automatically  registers  (or  unregisters)  the 
service  instance  with  the  service  router.  When  a  ser¬ 
vice  instance  moves  to  a  new  location,  the  network  stack 
automatically  updates  the  service  router(s)  with  the  new 
address,  and  performs  in-band  signaling  to  update  the  re¬ 
mote  end-points  of  established  flows. 

After  a  brief  comparison  of  SCAFFOLD  to  related 
work,  the  next  section  presents  case  studies  that  illustrate 
the  limitations  of  today’s  architecture  for  handling  ser¬ 
vice  replication  and  dynamics.  Then,  Section  3  describes 
the  service-level  naming  and  socket  API  in  SCAFFOLD. 
Section  4  presents  the  main  architectural  contributions, 
with  a  focus  on  a  single  datacenter.  The  wide-area  as¬ 
pects  of  SCAFFOLD  are  discussed  briefly  in  Section  5. 
We  discuss  security  issues  throughout  Sections  3-5.  Sec¬ 
tion  6  presents  our  prototype,  with  network  and  service 
routers  built  using  OpenFlow  [19]  and  NOX  [12],  and 


both  user-space  and  kernel-level  network  stacks  built 
as  extensions  to  Linux,  Click,  and  the  BSD  sockets 
API.  Section  7  evaluates  our  prototype,  using  both  mi¬ 
crobenchmarks  and  experiments  with  failover  and  migra¬ 
tion.  The  paper  concludes  in  Section  8. 

1.3  Comparison  to  Related  Work 

While  our  work  relates  to  several  areas  of  networking 
research,  SCAFFOLD  is  distinctive  in  proposing  a  com¬ 
prehensive  architecture,  revisiting  the  “division  of  labor” 
between  the  end-host  stack  and  the  network,  and  having 
a  running  prototype  implementation. 

Content-centric  networking:  CCN  [14]  has  a  differ¬ 
ent  focus,  where  names  correspond  to  chunks  of  content 
and  routing  does  not  consider  host  addresses;  SCAF¬ 
FOLD  names  a  (possibly  stateful)  service  and  includes 
(possibly  changing)  host  addresses  in  each  packet.  In 
contrast,  DONA  [17]  and  TRIAD  [11]  perform  name- 
based  routing  that  provide  a  similar  server-selection 
function  as  SCAFFOLD’S  service  routers;  however, 
these  papers  do  not  discuss  the  end-host  stack,  host- 
network  integration,  or  service  migration. 

Flat  service-level  names:  Several  other  papers  advo¬ 
cate  the  use  of  flat,  service-level  names  [35,  2,  36,  33,  3], 
However,  these  systems  take  a  different  approach  to 
name  resolution  by  relying  on  a  global  lookup  service 
like  DNS  or  a  DHT;  instead,  SCAFFOLD  uses  succes¬ 
sive  refinement  to  bind  to  a  service  instance.  In  addition, 
some  of  these  architectures  use  early  binding  [35,  2,  36], 
in  contrast  to  SCAFFOLD’S  use  of  late  binding. 

Location/identifier  separation:  Recent  protocols 
like  LISP  [7]  and  HIP  [23]  separate  host  identifiers  from 
locations  for  more  scalable  routing  and  simpler  multi¬ 
homing  and  mobility.  However,  LISP  and  HIP  focus  on 
individual  hosts,  rather  than  anycast  and  services  or  the 
range  of  dynamics  we  handle  in  SCAFFOLD. 

Transport-layer  migration:  TCP  Migrate  [32]  per¬ 
forms  in-band  signaling  and  DNS  updates  when  a  mobile 
host’s  address  changes,  but  does  not  consider  other  forms 
of  migration  (e.g.,  virtual  machines  and  multi-homing)  or 
replicated  services.  SCTP  [26]  supports  multi-homing 
by  specifying  secondary  addresses  for  hosts,  but  does 
not  support  other  forms  of  mobility.  Trickles  [31]  han¬ 
dles  server  dynamics  by  moving  connection  state  to  the 
client,  but  only  for  services  with  compact  state. 

Routing  protocols:  SCAFFOLD  is  complementary 
to  work  on  routing  architectures,  in  that  our  work  does 
not  focus  on  routing — beyond  allowing  hosts  to  have 
topology-dependent  addresses.  A  datacenter  running 
SCAFFOLD  is  free  to  select  whatever  routing  design 
(e.g.,  [10,  25,  24])  it  chooses.  Similarly,  inter-domain 
routing  in  SCAFFOLD  can  use  today’s  BGP  or  (better 
yet)  a  more  secure  wide-area  routing  solution  [15,  1], 
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2  Case  Studies  of  Online  Services 

In  this  section,  we  present  case  studies  that  motivate 
our  two  principles  for  supporting  online  services — flow- 
based  anycast  (to  support  replication)  and  network  ad¬ 
dresses  that  can  change  over  time  (to  support  dynamism). 

2.1  Replication 

Online  services,  whether  front-end  web  services  or  back¬ 
end  infrastructure  services,  are  replicated  on  many  ma¬ 
chines  for  better  performance  and  reliability. 

2.1.1  Web  Server  Farm 

Web  services  can  run  on  many  servers  spread  across  sev¬ 
eral  datacenters.  Existing  techniques  for  directing  client 
requests  to  web  servers  have  significant  limitations: 

IP  anycast:  IP  anycast — announcing  the  same  IP  pre¬ 
fix  from  each  datacenter — would  allow  the  service  to  rely 
on  wide-area  routing  to  direct  clients  to  the  “closest”  dat¬ 
acenter.  However,  since  different  packets  in  the  same 
flow  do  not  necessarily  reach  the  same  site,  IP  anycast  is 
typically  limited  to  connectionless  query-response  pro¬ 
tocols.  In  addition,  IP  anycast  increases  the  size  of  the 
global  routing  tables,  forcing  routers  to  store  routing  in¬ 
formation  for  many  more  address  blocks. 

DNS:  Session-based  services,  like  HTTP,  rely  on  other 
mechanisms,  like  DNS,  to  return  different  IP  addresses 
for  the  same  service  name.  However,  when  the  set  of 
servers  changes,  an  out-of-band  mechanism  must  up¬ 
date  the  authoritative  DNS  servers.  Better  responsive¬ 
ness  requires  smaller  DNS  Time-To-Live  (TTL),  which 
makes  DNS  caching  less  effective;  in  addition,  many 
web  browsers  cache  DNS  responses  for  around  15  min¬ 
utes,  independent  of  the  TTL. 

Load  balancers:  Within  a  single  datacenter,  a  front- 
end  load  balancer  can  distribute  requests  sent  to  a  sin¬ 
gle  public -facing  IP  address.  However,  load  balancers 
must  maintain  state  and  handle  all  client  traffic  to  en¬ 
sure  flow  affinity,  particularly  when  failures  may  change 
the  server  pool.  Some  out-of-band  mechanism  must  up¬ 
date  the  load  balancer  when  the  set  of  servers  changes, 
and  the  load  balancer  itself  must  be  replicated  to  avoid  a 
single  point  of  failure.  Making  load-balancing  decisions 
on  finer-grain  names,  such  as  URLs,  typically  requires 
terminating  the  TCP  connection  to  reconstruct  and  parse 
the  HTTP  message.  Since  all  client  traffic  goes  through 
the  load  balancer,  the  load  balancer  must  lie  close  to  the 
clients  or  the  servers  to  minimize  latency. 

In  contrast,  SCALLOLD  supports  service  IDs  that  can 
correspond  to  a  web  site,  a  particular  URL,  or  anything 
in  between.  Network  support  for  flow  affinity  obviates 
the  need  for  all  client  traffic  to  traverse  a  load  balancer. 


2.1.2  Back-end  Data-Storage  Services 

Online  services  rely  on  back-end  storage  services  to 
maintain  a  reliable  and  consistent  view  of  service- 
specific  data.  To  handle  the  read  and  write  load,  the 
data  store  is  commonly  partitioned,  with  each  back-end 
server  storing  and  handling  requests  for  a  subset  of  data 
objects.  Lor  better  reliability  and  performance,  each  par¬ 
tition  might  be  replicated  across  multiple  servers.  Com¬ 
pared  to  the  web  service  example,  a  back-end  service  has 
the  luxury  of  modifying  the  software  running  on  its  own 
front-end  servers.  The  service  must  monitor  server  live¬ 
ness  (to  detect  server  additions  and  failures)  and  load  (to 
ensure  proper  load  balancing  over  the  instances).  In  ad¬ 
dition,  each  read  and  write  request  must  be  directed  to  a 
server  responsible  for  the  associated  object. 

Today,  each  service  implements  server  monitoring  and 
request  resolution  independently,  and  the  existing  solu¬ 
tions  have  scalability  limitations: 

Requester-side  resolution:  In  systems  like  Mem- 
cached  [20],  each  client  has  the  list  of  all  servers  and 
their  associated  “keyspace”.  Client-side  resolution  re¬ 
duces  lookup  latency,  but  sending  updated  server  lists  to 
all  clients  limits  scalability  and  inhibits  freshness. 

Resolver-side  resolution:  In  systems  like  Dy¬ 
namo  [6],  the  back-end  servers  mn  a  routing  protocol 
that  locates  a  correct  server  for  each  request.  This  al¬ 
lows  front-end  servers  to  send  requests  to  any  back-end 
server,  at  the  expense  of  higher  request  latency  and  the 
overhead  of  the  routing  protocol. 

In  contrast,  SCALLOLD  has  a  general  framework  for 
monitoring  server  liveness  and  load,  freeing  back-end 
services  from  implementing  it  individually.  In  addition, 
a  back-end  service  can  assign  a  service  identifier  to  each 
partition,  delegating  server  resolution  to  the  network. 

2.2  Dynamism 

An  end-point’s  location  may  change  due  to  client  mobil¬ 
ity,  server  migration,  or  failures.  Changing  the  host’s  IP 
address  disrupts  ongoing  flows  and  requires  out-of-band 
updates  to  direct  future  requests  to  the  right  place. 

2.2.1  Client  Mobility 

Internet  users  increasingly  expect  seamless  access  to  ser¬ 
vices  as  they  move.  However,  connections  are  identified 
by  the  fixed  IP  addresses  of  the  two  end-points,  leading 
to  clumsy  techniques  for  handling  mobility: 

Virtual  LANs:  Mobility  within  a  single  layer-two  net¬ 
work  is  relatively  easy,  since  the  client  can  retain  its  IP 
address.  However,  even  with  an  enterprise  network,  this 
requires  complex,  inefficient  Virtual  LAN  (VLAN)  con¬ 
figurations  that  place  all  wireless  access  points  in  a  com¬ 
mon  layer-two  subnet  and  force  inter- VLAN  traffic  to 
traverse  an  intermediate  gateway  router.  In  addition,  Eth¬ 
ernet  switches  are  slow  to  react  when  the  host  changes 
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locations,  due  to  out-of-date  cached  entries  in  switch  for¬ 
warding  tables. 

Mobile  IP:  Mobility  across  layer-two  boundaries 
changes  the  client  IP  address.  Solutions  like  Mobile- 
IP  [28]  allow  the  server  to  direct  traffic  through  an  in¬ 
termediate  “home  agent,”  at  the  expense  of  additional  in¬ 
frastructure  for  redirecting  traffic  and  the  performance 
degradation  from  “triangle  routing.” 

In  contrast,  SCAFFOLD  allows  a  host’s  address  to 
change  as  it  moves,  allowing  each  network  to  use  hier¬ 
archical  addressing  for  better  routing  scalability,  while 
minimizing  the  “stretch”  experienced  by  data  traffic. 

2.2.2  Virtual  Machine  Migration 

Online  services  increasingly  run  as  virtual  machines 
(VMs)  hosted  on  physical  servers.  VM  migration  is  a 
promising  way  to  consolidate  server  capacity  and  move 
services  closer  to  their  users.  VM  migration  is  concep¬ 
tually  simpler  than  client  mobility  because  (i)  migration 
is  planned,  whereas  client  mobility  is  unplanned,  and  (ii) 
the  service  can  include  its  own  mechanisms  for  VM  mi¬ 
gration  without  changing  the  client  software.  However, 
existing  techniques  remain  clumsy: 

Gratuitous  ARP:  Today,  a  VM  cannot  easily  migrate 
outside  of  its  layer-two  subnet,  since  the  VM  retains  its 
IP  address.  For  faster  migration  within  a  layer-two  sub¬ 
net,  the  VM  can  send  a  broadcast  packet — such  as  an  un¬ 
solicited  ARP  (Address  Resolution  Protocol)  response — 
to  update  the  forwarding  tables  in  the  learning  switches. 
The  gratuitous  ARP  also  serves  to  update  other  hosts  if 
the  VM’s  MAC  address  has  changed,  at  the  expense  of 
the  overhead  of  the  extra  broadcast  traffic. 

Mobile  IP:  Migrating  across  layer-two  boundaries 
raises  the  same  challenges  as  with  client  mobility,  and 
the  same  limitations  of  existing  solutions  like  Mobile  IP. 

In  contrast,  SCAFFOLD  allows  a  server  to  change  its 
address  as  it  moves,  allowing  ongoing  client  traffic  to 
flow  directly  to  the  new  location  while  simultaneously 
directing  future  client  requests  to  the  new  address. 

2.2.3  Failover,  Maintenance,  and  Load  Shedding 

Servers  frequently  go  down  (due  to  equipment  failures  or 
planned  maintenance),  or  need  to  shed  load  by  directing 
some  traffic  to  other  service  instances.  Continuing  a  con¬ 
nection  on  another  service  instance  relies  on  application- 
specific  solutions  (e.g.,  shared  connection  state  at  the 
servers,  or  clients  with  mechanisms  like  HTTP  “range 
requests”  that  can  fetch  the  remainder  of  a  response). 
Still,  the  network  also  plays  an  important  role  in  direct¬ 
ing  client  traffic  to  the  new  service  instance: 

DNS:  After  detecting  a  service  failure,  a  client  can  re¬ 
resolve  the  service  name  to  an  IP  address.  However,  the 
new  DNS  lookup  may  return  the  address  of  the  failed 


service  instance,  due  to  caching  at  the  local  DNS  server 
(and  some  servers’  practice  of  not  obeying  TTLs). 

ARP  spoofing:  To  hide  failures  from  clients,  the  re¬ 
placement  server  can  perform  “ARP  spoofing”  to  assume 
the  IP  address  of  the  old  server.  However,  ARP  spoofing 
only  works  for  servers  within  the  same  subnet,  and  forces 
the  new  server  to  assume  the  load  and  function  of  an  en¬ 
tire  machine,  rather  than  a  specific  service  or  connection. 

Instead,  when  a  SCAFFOLD  client  detects  a  failure 
(either  through  a  local  timeout  or  an  explicit  FAIL  mes¬ 
sage),  the  network  stack  re-resolves  the  servicelD  to 
a  registered  instance  of  the  service.  This  ensures  fast 
failover  to  a  live  service  instance,  for  applications  that 
can  take  advantage  of  this.  In  contrast  to  today’s  solu¬ 
tions,  SCAFFOLD  has  a  single  mechanism  for  handling 
a  wide  range  of  failures  (e.g.,  connection,  server  process, 
host,  rack,  and  datacenter),  both  planned  and  unplanned. 

3  Service-Centric  Abstractions 

In  this  section,  we  describe  how  services  are  named,  and 
discuss  the  abstraction  provided  by  SCAFFOLD  sockets. 

3.1  Service  Naming 

A  servicelD  is  a  fixed-length,  location-independent  name 
for  a  particular  service.  Each  servicelD  maps  to  a  group 
of  processes,  or  instances,  that  are  functionally  equiva¬ 
lent.  A  servicelD  could  correspond  to  the  contents  of  a 
Web  site,  a  partition  in  a  storage  system,  or  an  individual 
file.  If  an  application  needs  to  communicate  with  a  par¬ 
ticular  instance  (e.g.,  a  sensor  in  a  particular  location), 
these  individuals  should  be  named  separately.  System 
designers  identify  the  functionality  to  name.1 

Like  other  architectures  with  flat  names,  SCAFFOLD 
does  not  dictate  how  clients  learn  of  servicelDs,  but  en¬ 
visions  that  they  are  typically  sent  or  copied  between  ap¬ 
plications,  much  like  URIs,  with  little  human  interven¬ 
tion.  We  purposefully  do  not  specify  how  to  map  human- 
readable  names  to  servicelDs,  which  removes  the  legal 
tussle  over  naming  from  the  basic  architecture  [5,  35], 
Based  on  their  own  trust  relationships,  users  may  turn  to 
different  directory  services,  search  engines,  or  social  net¬ 
works  to  resolve  human-readable  names  to  servicelDs. 

In  pushing  naming  into  the  network  architecture,  serv¬ 
icelDs  can  provide  a  basis  for  secure  end-to-end  commu¬ 
nication.  In  particular,  if  servicelDs  are  self-certifying 
identifiers  [18] — cryptographic  hashes  of  services’  pub¬ 
lic  keys — one  end-point  could  then  verify  that  the  other 
end-point  is  an  authorized  instance  of  its  requested  ser¬ 
vice.  If  desired,  the  public  key  named  by  these  service- 

1  For  example,  our  port  of  the  memcached  key- value  store,  described 
in  §7,  uses  one  name  for  all  memcached  servers  (to  identify  resources 
that  can  host  key- value  partitions)  and  an  additional  name  for  each  par¬ 
tition  (so  clients  can  identify  where  keys  are  stored). 
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Figure  1 :  The  SCAFFOLD  packet  header,  with  a  decoupling 
between  who  (the  servicelD),  where  (the  hostAddr),  and  which 
connection  (the  soeketID).  Other  fields  like  packet  length,  version 
number,  and  checksum  are  omitted  for  simplicity. 


Table  1 :  Comparison  ofBSD  socket  protocol  families:  sockaddr 
data  structures  in  TCP/IP  include  both  an  IP  address  and  port 
number,  while  SCAFFOLD  structures  include  only  a  servicelD. 

IDs  also  can  be  used  to  establish  encrypted  and  authenti¬ 
cated  connections  between  end-points  as  well.  This  does 
come  at  the  cost  of  increased  servicelD  length  (256  bits 
vs.  96  bits),  however.  On  the  other  hand,  today’s  Internet 
provides  end-to-end  authentication  only  at  the  applica¬ 
tion  layer  (e.g..  through  SSL  and  certificate  authorities), 
which  has  limited  its  deployment. 

3.2  SCAFFOLD  Sockets 

Given  the  primacy  of  services  in  SCAFFOLD,  applica¬ 
tions  should  be  able  to  initiate  communication  with  ser¬ 
vice  names.  Correspondingly,  SCAFFOLD  defines  a 
new  BSD  protocol  family  (PF_SCAFFOLD)  that  refers 
to  servicelDs,  rather  than  the  network  addresses  of  the 
IP  protocol  family  (PF_INET).  The  BSD  sockets  API’s 
flexibility  allows  such  new  protocol  families  to  be  de¬ 
fined  and  implemented  with  relative  ease,  and  SCAF¬ 
FOLD  therefore  retains  compatibility  with  the  BSD  API 
itself.  Table  1  highlights  the  main  differences  in  the  use 
of  the  above  protocol  families. 

Allowing  network  addresses  to  change:  The  net¬ 
work  addresses  of  the  communicating  endpoints  are  not 
exposed  to  applications.  Hiding  addresses  from  ap¬ 
plications  is  crucial  since  these  addresses  may  change 
over  time  if  a  session  or  process  moves.  Other  high- 
level  socket  interfaces  hide  addresses  from  applications; 
for  example,  the  Java  socket  API  accepts  a  hostname, 
and  the  new  WebSockets  API  [37]  in  many  browsers 
accepts  a  URL.  However,  these  interfaces  simply  per¬ 
form  DNS  resolution  before  using  a  standard  BSD  socket 
connect,  and  thus  do  not  allow  addresses  to  change 
over  the  lifetime  of  the  connection. 

Connecting  unreliable  flows:  To  support  flow  affin¬ 
ity,  SCAFFOLD  must  distinguish  between  a  new  flow — 
one  that  is  not  yet  bound  to  a  service  instance — and  a 
bound  flow.  As  such,  both  reliable  streams  and  unreli¬ 
able  datagrams  use  a  connection-establishment  mecha¬ 
nism  (unlike  today’s  “connectionless”  UDP).  Thus,  we 
use  the  term  connection  to  refer  to  a  bound  flow,  inde¬ 
pendent  of  its  reliability. 


Updating  the  network:  To  better  handle  service 
churn,  the  bind  and  close  calls  interact  with  the  net¬ 
work  to  register  and  unregister  servicelDs,  respectively. 
For  example,  if  host  B  no  longer  provides  service  X  (i.e., 
the  application  on  B  closes  its  socket  bound  to  servicelD 
X),  the  network  stack  unregisters  X.  This  tight  coupling 
between  the  end-host  stack  and  the  network  ensures  the 
membership  of  the  service  group  remains  up-to-date. 

4  SCAFFOLD  Architecture 

This  section  elaborates  on  the  two  central  aspects  of 
SCAFFOLD’S  design:  support  for  replication  through 
anycast  with  flow  affinity,  and  support  for  dynamism 
through  resource  registration  and  in-band  address  rene¬ 
gotiation.  We  restrict  our  consideration  to  the  local  area, 
expanding  to  wide-area  networking  in  the  next  section. 

To  support  service  naming  and  flow-based  anycast, 
SCAFFOLD  introduces  a  new  packet  header.  Every 
packet  includes  the  servicelD,  network  address,  and 
socket  identifier  for  both  the  source  and  destination,  as 
shown  in  Figure  1;  these  fields  are  fixed-length  for  fast 
processing  in  hardware.  The  address  field  includes  a 
“host”  address  that  identifies  the  network  attachment 
point  (in  the  RFC  1498  [30]  sense),  so  that  a  host  can 
attach  to  multiple  networks  simultaneously  or  migrate  a 
connection  from  one  interface  to  another.  These  fields 
add  up  to  between  40  and  92  bytes  in  length,  depending 
on  whether  clients  include  service  names  and  whether 
servicelDs  are  self-certifying  for  secure  communication. 

4.1  Flow-Based  Anycast 

SCAFFOLD  supports  flow-based  anycast  through  suc¬ 
cessive  refinement  of  the  first  packet  in  a  flow,  rather 
through  a  single  lookup  in  a  global  name-resolution  ser¬ 
vice.  The  first  packet  of  a  flow  is  directed  to  a  service 
router  that  selects  a  service  instance  (with  a  hostAddr), 
and  the  receiving  host  then  assigns  the  soeketID.  Suc¬ 
cessive  refinement  improves  scalability  by  scoping  churn 
in  the  set  of  hosts  offering  a  service,  particularly  in  the 
wide-area  setting  as  discussed  in  Section  5. 

The  network  consists  of  service  routers  (that  re¬ 
solve  servicelDs  to  hostAddrs)  and  network  routers  (that 
forward  packets  based  on  their  destination  addresses). 
While  one  device  could  perform  both  functions,  the 
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Figure  2:  SCAFFOLD  network  elements,  including  service 
routers  (that  resolve  a  servicelD  to  a  hostAddr)  and  network  routers 
(that  forward  packets  based  on  the  hostAddr).  Also  shown  are  the 
forwarding  tables,  with  a  numbered  output  port  for  each  entry. 


Figure  3:  Establishing  a  bound  flow  by  routing  and  resolving  the 
first  packet  based  on  destination  servicelD. 


two  roles  are  conceptually  distinct,  as  illustrated  in  Fig¬ 
ure  2.  Upon  initiating  a  connection  to  service  X  (via 
a  connect  call),  the  client  I/’s  network  stack  sends  a 
SYN  packet  with  the  destination  servicelD  X  and  the  des¬ 
tination  address  unset  (i.e.,  all  Os),  as  shown  in  step  (1) 
in  Figure  3.  The  network  ensures  that  the  SYN  packet 
reaches  an  instance  of  service  X  (e.g.,  on  server  B). 

Service  router:  Upon  receiving  a  packet  with  an  unre¬ 
solved  destination  address,  the  network  router  directs  the 
packet  to  a  service  router  for  resolution.  For  each  serv¬ 
icelD,  the  service  router  stores  addresses  corresponding 
to  one  or  more  service  instances.  In  Figure  2,  serv¬ 
icelD  X  maps  to  hosts  B  and  C.  Upon  receiving  the  SYN 
packet,  the  service  router  looks  up  X  to  set  the  destina¬ 
tion  hostAddr  (e.g.,  destination  B  in  step  (2)).  Our  imple¬ 
mentation  supports  randomized  selection  among  match¬ 
ing  entries  using  a  weighted  proportional  split,  although 
other  policies  are  feasible.  While  service  routers  handle 
the  first  packet  of  each  flow,  the  remaining  packets  are 
typically  handled  by  network  routers. 

Network  router:  Like  today’s  IP  routers,  a  SCAF¬ 
FOLD  network  router  forwards  packets  based  on  the 
destination  address,  except  for  two  additional  functions. 
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Figure  4:  End-host  state  on  client  host  A,  with  network  addresses 
hidden  from  applications. 


First,  network  routers  have  a  special  forwarding  entry  to 
direct  unresolved  packets  to  a  service  router;  otherwise, 
the  network  routers  forward  packets  based  on  the  desti¬ 
nation  address.  Second,  as  an  optimization  for  fast  fail¬ 
ure  notification,  the  network  router  may  send  a  failure 
message  (indicating  that  the  service  instance  is  unreach¬ 
able)  back  to  the  sender  (akin  to  ICMP  host  unreachable) 
if  no  end-host  matches  the  address.  This  is  discussed  in 
§4.2.  A  typical  network  would  consist  mostly  of  network 
routers,  with  a  smaller  set  of  service  routers. 

End-host  network  stack:  SCAFFOLD’S  network 
stack  hides  network  addresses  from  applications,  as  il¬ 
lustrated  in  Figure  4  which  shows  the  state  for  client  A. 
Both  the  application  and  the  network  stack  know  the  lo¬ 
cal  servicelD  (e.g.,  U),  set  when  an  application  binds  a 
socket.  The  stack’s  socket  state  also  includes  a  local  ad¬ 
dress  and  socketID  (e.g.,  A:p)  that  the  application  does 
not  see.  The  hostAddr  A  is  unique  to  the  host’s  physical 
interface,  while  the  socketID  (e.g.,  p)  is  a  locally-unique 
identifier  the  stack  assigns  when  creating  the  socket.  This 
socketID  allows  the  stack  to  demultiplex  packets  to  the 
appropriate  socket  after  the  connection  is  established; 
this  practice  is  unlike  today’s  sockets  that  demultiplex 
packets  based  on  the  addresses  and  port  numbers  of  both 
end-points  of  the  connection,  making  it  difficult  for  ei¬ 
ther  end-point  to  change  its  identifiers. 

The  server’s  network  stack  identifies  the  receiving  ap¬ 
plication  and  assigns  a  socketID.  Upon  receiving  the 
SYN  packet  from  client  A,  the  server  B’s  network  stack 
demultiplexes  the  packet  to  an  application  based  on  the 
destination  servicelD  X.  The  stack  also  assigns  a  locally- 
unique  socketID,  r,  and  includes  the  hostAddr  and  sock¬ 
etID  in  the  return  SYN-ACK  packet  shown  in  step  (3)  of 
Figure  3.  Upon  receiving  the  SYN-ACK,  host  A  con¬ 
siders  the  socket  bound  and  records  the  remote  iden¬ 
tifier  B  :  r,  before  sending  an  ACK  packet.  The  ACK 
packet,  like  subsequent  packets  in  the  connection,  trav¬ 
els  directly  to  the  remote  end-point  without  traversing  the 
service  router.  This  completes  the  three-way  connection 
establishment  for  a  bound  flow. 
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Event 

End-Host  Trigger 

Network  Action 

join 

Interface  link  up 

add  ( hostaddr,  loc)  at  net  routers 
send  joined(hostaddr)  to  host 

leave 

Link/host  down, 
or  host  unavailable 

rem  (addr,  *)  from  net  routers 
rem  (*,  addr)  from  srv  routers 

register 

Socket  bound 

add  ( srvid ,  addr)  at  srv  routers 

unregister 

Socket  closed 

rem  {srvid,  addr)  from  srv  routers 

Table  2:  End-host  updates  to  network  and  service  routers 


The  three-way  handshake  is  overkill  for  services  that 
need  only  a  simple  datagram  abstraction,  where  the  client 
sends  a  single  datagram  to  the  destination  servicelD 
and  the  receiver  optionally  sends  a  response.  SCAF¬ 
FOLD  can  support  unbound  datagrams  in  two  different 
ways.  First,  each  end-point  can  send  packets  with  both 
addresses  unset,  requiring  resolution  through  a  service 
router  in  both  directions.  This  avoids  per-flow  state  in 
the  end-host  network  stack,  at  the  cost  of  higher  stretch 
for  return  traffic.  Second,  the  client  can  send  packets 
with  the  source  address  set — much  like  the  setup  phase 
for  bound  flows — which  allows  the  recipient  to  bypass 
servicelD  resolution  in  the  return  direction. 

4.2  Automatic  Updates  Under  Dynamics 

A  service  instance  (or  its  underlying  host)  can  easily  fail 
or  move  to  a  new  location,  and  these  changes  may  be 
planned  ( e.g .,  planned  maintenance  or  virtual  machine 
migration)  or  unplanned  (e.g.,  server  failure  or  client  mo¬ 
bility).  When  changes  happen,  SCAFFOLD  automati¬ 
cally  updates  the  service  and  network  routers  to  direct 
new  flows  correctly.  In  addition,  the  end-host  stack  up¬ 
dates  the  remote  end-points  of  ongoing  connections  with 
a  new  hostAddr  and  socketID.  This  tighter  integration 
between  services,  hosts,  and  the  network  allows  SCAF¬ 
FOLD  to  support  seamless  service  in  the  face  of  change. 

4.2.1  Updating  the  Service  and  Network  Routers 

When  a  host  interface  joins  or  leaves  the  network,  or  a 
host  starts  or  stops  supporting  a  service,  the  routers  are 
updated  automatically,  as  summarized  in  Table  2. 

Join:  When  an  interface  connects  to  the  network,  a 
hostAddr  is  assigned  and  the  network  routers  are  updated 
to  forward  packets  toward  this  address.  For  example, 
each  network  router  in  Figure  2  has  a  forwarding-table 
entry  for  hostAddr  B. 

Register:  When  an  application  binds  on  a  servicelD, 
the  network  stack  registers  the  service  instance  with  the 
service  router.  For  example,  the  service  router  in  Figure  2 
has  an  entry  mapping  servicelD  X  to  hostAddr  B. 

Unregister:  When  an  application  closes  a  socket, 
the  stack  unregisters  the  servicelD  with  the  service 
router.  If  the  application  on  server  B  performs  a  close, 
the  service  router  deletes  the  mapping  of  servicelD  X  to 
hostAddr  B,  and  directs  future  flows  to  host  C. 


Leave:  When  an  interface  fails  or  shuts  down,  the  net¬ 
work  routers  are  updated  to  stop  forwarding  packets  to 
the  associated  hostAddr.  In  addition,  the  service  router  is 
updated  to  remove  all  entries  for  this  hostAddr.  For  ex¬ 
ample,  as  part  of  shutting  down  host  B,  the  network  stack 
could  “leave”  the  network  to  explicitly  update  the  service 
and  network  routers.  When  the  interface  fails,  heartbeat 
messages  can  detect  the  failure  and  trigger  the  “leave” 
event  on  the  host’s  behalf. 

The  tight  integration  between  the  end-host  and  the  net¬ 
work  ensures  a  fast  response  to  both  planned  and  un¬ 
planned  changes.  Automatically  updating  the  service 
and  network  routers  prevents  accidental  inconsistencies 
that  can  easily  arise  in  configuring  load  balancers  and 
DNS  servers  in  today’s  networks.  These  join/leave  and 
register/unregister  primitives  also  allow  an  interface  to 
change  addresses  when  connecting  to  a  new  location 
(i.e.,  by  having  the  end-host  stack  register  the  service- 
IDs  with  the  new  hostAddr),  or  a  host  to  start  receiving 
traffic  on  an  alternate  interface  (i.e.,  by  registering  the 
servicelDs  with  the  other  interface’s  address). 

Our  architecture  leaves  network  designers  with  many 
ways  to  handle  join/leave  and  register/unregister  events, 
ranging  from  a  centralized  controller  to  a  flooding  pro¬ 
tocol.  For  example,  our  prototype  uses  a  logically- 
centralized  controller  to  install  table  entries  in  both  the 
network  and  service  routers,  by  intercepting  join/leave 
and  register/unregister  events  sent  by  the  end-host  stack. 

Securing  registration:  SCAFFOLD  must  secure  the 
control  path  that  governs  servicelD  registration,  as  other¬ 
wise  an  unauthorized  entity  could  register  itself  as  host¬ 
ing  the  service.  Even  if  end-points  authenticate  one  an¬ 
other  during  connection  setup  using  self-certifying  serv¬ 
icelDs,  faulty  registrations  would  serve  as  a  denial-of- 
service  attack.  To  prevent  this  attack,  the  registering  end¬ 
point  must  prove  that  it  is  authorized  to  host  the  serv¬ 
icelD.  This  can  be  accomplished  using  similar  authen¬ 
tication  mechanisms,  based  on  self-certifying  servicelDs 
(where  the  registering  host  either  knows  the  service’s  pri¬ 
vate  key  itself,  or  has  its  own  keypair  certified  by  the 
service’s  key).  On  the  other  hand,  local  networks  can 
employ  simpler  mechanisms  as  well,  e.g.,  place  the  con¬ 
trol  channel  on  a  virtually-isolated  network,  as  opposed 
to  relying  on  cryptographic  security. 

4.2.2  Updating  the  End-Points  of  Ongoing  Flows 

SCAFFOLD  uses  a  single  mechanism — in-band  address 
renegotiation,  akin  to  TCP  Migrate  [32] — to  allow  an 
ongoing  connection  to  continue  across  many  different 
sources  of  churn  (e.g.,  connection  or  virtual-machine  mi¬ 
gration,  client  mobility,  and  load  balancing  for  multi¬ 
homed  hosts).  When  a  service  instance  fails,  SCAF¬ 
FOLD  can  also  support  failover  to  another  service  in¬ 
stance,  for  applications  that  want  it. 
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In-band  signaling  to  change  addresses:  When  an 
end-point  moves,  its  network  stack  sends  an  RSYN 
packet — with  the  new  hostAddr  and  socketID  in  the 
source  field — to  the  remote  end-point.  Upon  receiv¬ 
ing  the  RSYN,  the  stationary  end-point  includes  the  new 
identifiers  in  its  socket  table,  and  generates  a  new  sock¬ 
etID  for  its  end  of  the  connection  before  sending  an 
RSYN-ACK.  Creating  new  socketIDs  at  both  end-points 
ensures  the  renegotiation  process  is  robust  to  out-of- 
order  packets,  even  when  an  end-point  changes  loca¬ 
tion  multiple  times.  The  mobile  end-point  completes  the 
renegotiation  process  by  sending  a  final  ACK  to  acknowl¬ 
edge  the  new  socketID  of  the  stationary  end-point.  The 
end-points  retransmit  the  RSYN  and  RSYN-ACK  packets 
until  they  are  acknowledged. 

To  ensure  our  protocol  handles  complex  “corner 
cases”  correctly,  we  modeled  our  solution  in  Promela  and 
used  SPIN  [13]  to  verify  correctness  under  packet  loss, 
out-of-order  packet  delivery,  and  end-points  that  move 
multiple  times.  In  addition  to  detecting  subtle  bugs  in 
our  original  design,  using  Promela/SPIN  helped  us  iden¬ 
tify  several  properties  needed  for  correctness:  (i)  The 
stationary  host  must  be  able  to  determine  if  an  RSYN 
message  reflects  a  migration  that  occurred  before  or  af¬ 
ter  the  last  migration  of  the  same  remote  end-point;  (ii) 
the  RSYN  message  must  be  idempotent  across  multiple 
address  changes;  and  (iii)  if  an  end-point  moves  to  mul¬ 
tiple  locations  at  once  (as  can  happen  with  a  VM  copy  or 
if  there  is  a  network  partition),  the  stationary  host  must 
commit  to  only  one  of  the  locations.  In  particular,  the 
first  two  observations  drove  our  decision  to  change  the 
socketID  of  the  stationary  end-point.2 

Failover  to  another  service  instance:  If  a  service 
instance  fails  (e.g.,  the  application  process  crashes  or 
closes  a  socket),  the  network  stack  can  respond  to  in¬ 
coming  packets  with  a  FAIL  message  that  quickly  noti¬ 
fies  the  remote  end-point  about  the  failure.  (Optionally, 
if  the  physical  machine  fails,  the  incident  network  router 
could  generate  a  FAIL  message.)  After  detecting  a  failure 
(via  a  FAIL  message  or  a  local  timeout),  an  end-point  can 
initiate  a  new  connection  with  another  service  instance 
by  initiating  a  three-way  RSYN  handshake  with  an  un¬ 
resolved  destination  address  (i.e.,  all  Os).  The  RSYN 
packet  would  go  to  a  service  router  that  would  then  se¬ 
lect  a  service  instance.  Of  course,  failover  only  makes 
sense  for  certain  kinds  of  applications,  where  the  server 

-The  situation  is  more  complicated  if  both  end-points  change  lo¬ 
cations  at  the  same  time — e.g.,  a  mobile  client  moves  while  a  server 
virtual  machine  migrates — because  neither  end-point's  RSYN  packet 
would  successfully  reach  the  other  end-point.  We  plan  to  handle  “dou¬ 
ble  migration”  by  having  the  end-point’s  old  location  detect  RSYN 
messages  sent  to  a  recently-moved  end-point.  For  example,  if  a  VM 
migrates,  the  network  stack  of  old  physical  host  could  direct  the  mo¬ 
bile  client’s  RSYN  to  the  VM’s  new  physical  location.  Extending  our 
design  to  handle  “double  migration”  is  part  of  our  ongoing  work. 


Figure  5:  Wide-area  SCAFFOLD  network  architecture  and  con¬ 
nection  establishment. 


instances  share  enough  state  to  continue  the  connection, 
or  the  client  has  enough  information  to  request  the  re¬ 
mainder  of  a  response  (e.g.,  the  “range  request”  feature 
in  HTTP).  As  such,  SCAFFOLD  sockets  have  a  “want 
failover”  option  that  allows  the  client  to  request  failover 
semantics  from  the  server;  this  option  triggers  the  setting 
of  a  “want  failover”  flag  in  every  packet. 

Securing  migration  and  failover.  Malicious,  off-path 
entities  should  not  be  able  to  disrupt  ongoing  connec¬ 
tions  by  spoofing  migration  (RSYN),  failover  (FAIL),  or 
connection  close  (FIN/RST)  messages.  To  prevent  such 
attacks,  SCAFFOLD  uses  long,  randomly-chosen  sock¬ 
etIDs  for  its  connections.  Because  SCAFFOLD  only  ac¬ 
cepts  control  messages  that  include  the  correct  destina¬ 
tion  socketID,  off-path  attackers  must  guess  this  sock¬ 
etID  through  brute-force  enumeration.  While  equiva¬ 
lent  to  the  use  of  randomized  sequence  numbers  in  TCP 
and  randomized  transactions  IDs  in  DNS,  SCAFFOLD’S 
socketID  should  be  larger  than  these  identifiers  (e.g.,  48 
bits),  as  the  same  socketID  may  persist  throughout  the 
life  of  a  connection.  To  secure  against  migration  attacks 
by  on-path  entities,  we  can  use  the  authenticated  chan¬ 
nels  provided  by  self-certifying  socketIDs. 

5  SCAFFOLD  in  the  Wide  Area 

Until  now,  we  have  described  SCAFFOLD  running  in  a 
single  datacenter  network,  where  the  service  routers  can 
resolve  all  servicelDs  and  the  network  routers  can  reach 
all  host  addresses.  This  section  expands  our  architecture 
to  the  wide  area,  to  support  name  resolution  and  routing 
between  multiple  SCAFFOLD  networks.  We  first  dis¬ 
cuss  how  service  systems  and  autonomous  systems  man¬ 
age  the  global  name  and  address  spaces,  respectively. 
Then  we  describe  how  routers  forward  packets  based  on 
successively-refined  addresses,  and  conclude  with  a  dis¬ 
cussion  of  wide-area  resolution  of  servicelDs. 

5.1  Service  and  Autonomous  Systems 

Much  like  today’s  Internet,  a  SCAFFOLD  network  con¬ 
sists  of  multiple  administrative  domains.  However,  we 
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have  separate  notions  of  the  administrative  domains  that 
manage  servicelDs  and  hostAddrs,  respectively: 

Service  Systems  (SSes)  manage  their  own  part  of  the 
servicelD  namespace,  ensuring  that  each  servicelD  is 
unique.  SSes  are  identified  by  a  globally-unique  SS  iden¬ 
tifier  ( ssID )  that  forms  the  high-order  bits  of  the  serv¬ 
icelD.  SSes  are  responsible  for  providing  the  authorita¬ 
tive  name  resolution  for  their  servicelDs.  So,  while  serv¬ 
icelDs  are  location-independent  (in  the  sense  that  service 
instances  may  reside  anywhere),  the  allocation  and  reso¬ 
lution  of  servicelDs  is  hierarchical  to  ensure  servicelDs 
are  unique  and  that  the  resolution  process  is  scalable. 

Autonomous  Systems  (ASes)  consist  of  routers  and 
hosts  visible  to  the  external  world  as  a  single  entity 
(e.g.,  a  datacenter,  enterprise,  or  residential  access  net¬ 
work),  just  as  in  today’s  Internet.  Each  AS  is  iden¬ 
tified  by  a  globally-unique,  routable  network  address 
( ASAddr )  that  serves  as  the  basis  of  wide-area  routing. 
This  ASAddr  appears  in  the  high-order  bits  of  a  SCAF¬ 
FOLD  network  address,  ensuring  that  the  entire  address 
ASAddr  diostAddr  is  globally  unique. 

An  SS  typically  corresponds  to  an  administrative  en¬ 
tity,  while  ASes  connote  a  physical  network  location.  For 
example,  we  envision  a  single  organization  (like  Google, 
Microsoft,  or  Amazon)  may  have  a  single  SS,  but  have  a 
separate  AS  for  each  of  its  datacenters,  as  shown  in  Fig¬ 
ure  5.  The  SS  would  ensure  servicelD  uniqueness  and 
perform  authoritative  resolution,  while  one  or  more  ASes 
host  the  service  instances  that  comprise  a  service.  This 
logical  separation  supports  a  wide  range  of  usage  scenar¬ 
ios  and  business  arrangements — from  full  in-house  nam¬ 
ing,  resolution,  and  hosting,  to  moving  each  aspect  to  a 
(possibly  different)  third-party  provider. 

5.2  Hierarchical  Network  Addresses 

A  SCAFFOLD  network  address  consists  of  a  fixed- 
length  ASAddr  and  a  locally-meaningful  hostAddr.  In 
practice,  an  AS  could  subdivide  the  hostAddr  bits  to 
introduce  multiple  levels  of  hierarchy,  as  is  common 
in  today’s  IP  networks.  For  simplicity,  we  focus  on  a 
two-level  hierarchy  where  wide-area  routing  relies  only 
on  the  ASAddr  and  intra-AS  routing  relies  only  on  the 
hostAddr.  This  model  offers  several  advantages: 

Scalable  inter-domain  routing:  Wide-area  routing 
operates  on  fixed-length  ASAddrs,  similar  to  the  ap¬ 
proach  in  AIP  [1],  This  leads  to  smaller  routing  tables 
and  simpler  packet  forwarding,  compared  with  the  many 
variable-length  IP  prefixes  in  today’s  routing  system. 

Hiding  intra-AS  service  dynamics:  Wide-area  reso¬ 
lution  of  a  servicelD  need  only  set  the  ASAddr,  rather 
than  the  hostAddr  of  a  specific  service  instance.  As  ser¬ 
vice  instances  (un)register  or  move  within  an  AS,  global 
name  resolution  does  not  have  to  change — unless  an  AS 
no  longer  has  any  hosts  offering  a  service. 


Successive  refinement  of  destination  addresses:  A 

sender  does  not  necessarily  need  to  know  the  destina¬ 
tion  hostAddr — just  the  destination  ASAddr.  The  sender 
can  leave  the  hostAddr  unset  (i.e.,  all  Os),  allowing  the 
service  router  in  the  destination  AS  to  select  a  specific 
service  instance. 

Consider  the  example  in  Figure  5,  where  a  single  SS 
a  consists  of  three  ASes  (say,  datacenters).  Suppose 
AS  1  handles  wide-area  requests  for  servicelDs  managed 
by  the  SS.  Upon  receiving  a  SYN  packet  for  destination 
servicelD  X,  the  service  router  in  AS  1  identifies  a  suit¬ 
able  destination  AS  (e.g.,  AS  2)  and  changes  the  destina¬ 
tion  ASAddr  accordingly.  The  packet  then  reaches  AS  2, 
where  a  network  router  forwards  the  unresolved  packet  to 
the  local  service  router.  The  service  router  sets  the  des¬ 
tination  hostAddr  to  one  of  the  local  instances  of  service 
X,  as  shown  earlier  in  Figure  3.  Upon  receiving  the  SYN 
packet,  the  host  sends  a  SYN-ACK  with  its  hostAddr  and 
socketID.  The  SYN-ACK  packet  and  all  future  packets  (in 
both  directions)  bypass  the  resolution  process,  and  travel 
directly  between  the  sending  and  receiving  ASes  through 
network  routers. 

5.3  Scalable  Resolution  of  Service  IDs 

To  resolve  a  servicelD,  a  sender  must  know  which  AS  to 
use  as  the  initial  target  of  the  SYN  packet;  this  AS  should 
have  a  service  router  with  up-to-date  information  about 
where  the  current  service  instances  are  located.  SCAF¬ 
FOLD  does  not  dictate  how  the  sender  identifies  this  AS, 
and  several  different  approaches  are  possible.  A  wide- 
area  implementation  of  SCAFFOLD  could  leverage  ex¬ 
isting  approaches,  such  as: 

Hierarchical  name  resolution  (like  today’s  DNS): 

Name  resolution  could  proceed  through  a  hierarchical 
collection  of  name  servers,  similar  to  today’s  DNS.  The 
name-resolution  servers  in  the  hierarchy  would  corre¬ 
spond  to  the  parties  responsible  for  managing  and  del¬ 
egating  portions  of  the  servicelD  namespace. 

Dissemination  via  wide-area  routing  (similar  to 
LISP-ALT  [8]):  Each  SS  could  have  name  resolution 
performed  by  service  routers  in  one  or  more  ASes,  and 
these  ASes  could  announced  the  ssID  into  a  global  rout¬ 
ing  protocol.  For  example,  the  SS  a  in  Figure  5  could 
announce  its  ssID  from  ASes  1,  2,  and  3,  ensuring  un¬ 
resolved  packets  reach  a  nearby  AS  that  can  identify  a 
(possibly  different)  AS  providing  the  desired  service. 

Authoritative  service  routers  must  be  updated  when 
mappings  from  servicelDs  to  ASAddrs  change.  SCAF¬ 
FOLD  does  not  dictate  how  this  intra-SS  update  proto¬ 
col  is  implemented,  and  different  SSes  may  employ  dif¬ 
ferent  techniques.  One  possibility  is  to  run  a  decentral¬ 
ized  update  protocol  between  service  routers,  so  that  lo¬ 
cal  changes  are  disseminated  hierarchically,  propagating 
between  ASes  only  if  they  affect  wide-area  resolution. 
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IP  Header  Field 

SCAFFOLD  Purpose 

Limit 

SRC,  DST  ports 

servicelD 

65K 

Type  ol  Service 

flags 

n/a 

Bits  0-8,  IP  address 

ASAddr 

256 

Bits  8-16,  IP  address 

hostAddr 

256 

Bits  16-32,  IP  address 

sockID 

65k 

Table  3 :  SCAFFOLD’S  usage  of  IPv4  header  and  transport  ports. 

6  Prototype  Implementation 

An  architectural  design  like  SCAFFOLD  would  be  in¬ 
complete  without  incorporating  implementation  experi¬ 
ence.  Through  a  working  prototype,  we  can  (i)  evaluate 
the  performance  and  scalability  of  the  architecture  and 
learn  of  unforeseen  design  issues,  (ii)  explore  incremen¬ 
tal  deployment  strategies,  and  (iii)  port  applications  in 
order  to  evaluate  the  effort  involved,  and  learn  whether 
applications  can  benefit  from  SCAFFOLD  abstractions. 

To  implement  SCAFFOLD,  we  chose  an  incremental 
approach  that  leverages  existing  platforms  like  Click  [16] 
for  the  end-host  stack,  and  OpenFlow/NOX  [19,  12]  for 
network  elements.  These  platforms  allow  us  to  rapidly 
prototype  and  evaluate  our  implementation.  Moreover, 
OpenFlow  gives  us  a  path  towards  hardware  implemen¬ 
tation  in  commercial  switches,  by  leveraging  (and  per¬ 
haps  influencing)  the  ongoing  development  of  a  standard 
platform.  A  further  goal  is  to  deploy  SCAFFOLD  on  a 
variety  of  platforms — such  as  Linux,  Mininet  [21],  Plan- 
etLab  [29],  VINI  [4],  and  GENI  [9]— for  larger-scale 
evaluations.  Since  some  of  these  platforms  only  support 
user-space  operation,  we  chose  to  implement  our  end- 
host  stack  so  that  it  runs  in  both  user  and  kernel  space — 
to  achieve  both  deployability  and  performance. 

6.1  OpenFlow  and  IPv4  Headers 

Since  OpenFlow  currently  supports  only  IPv4,  our  pro¬ 
totype  repurposes  the  20  bytes  of  the  IPv4  header,  plus 
the  combined  4  bytes  of  the  source  and  destination 
ports  of  transport  headers  to  implement  the  SCAFFOLD 
protocol.  SCAFFOLD’S  use  of  IPv4  header  fields  is 
shown  in  Table  3,  alongside  the  resulting  scalability  lim¬ 
its.  Using  the  high-order  bits  of  the  IP  address  as  the 
ASAddr  allows  SCAFFOLD  to  use  prefix-based  IP  rout¬ 
ing  across  the  wide-area  (and  BGP  for  inter-domain  route 
updates).  The  combination  of  ASAddr  and  hostAddr  per¬ 
mits  (again,  prefix-based)  IP  routing  in  intra-domain  set¬ 
tings  as  well,  which  enables  mixed  deployments  using 
both  IP  and  SCAFFOLD  routers.  A  future  implementa¬ 
tion  would  use  its  own  native  headers,  but  requires  more 
flexible  header  matching  envisioned  for  future  releases 
of  OpenFlow. 

6.2  OpenFlow/N OX-based  Network 

The  resolution  and  routing  components  of  the  SCAF¬ 
FOLD  network  are  based  on  the  OpenFlow-enabled 


OpenVSwitch  software  switch  [27],  which  allows  the  dy¬ 
namic  insertion  of  packet-matching  rules  in  its  forward¬ 
ing  tables.  SCAFFOLD  proactively  installs  destination- 
based  resolution  and  forwarding  rules  in  the  service  and 
network  routers.  Service  routers  resolve  packets  by 
matching  on  the  servicelD  and  selecting  a  service  in¬ 
stance  according  to  its  rule  set.  Network  routers  forward 
based  solely  on  matching  the  destination  address,  which 
simplifies  the  routing  table  and  minimizes  the  rule  space 
required. 

At  the  heart  of  the  SCAFFOLD  network  implementa¬ 
tion  is  a  centralized  controller  running  on  the  NOX  net¬ 
work  control  platform  [12],  The  controller  application, 
about  5000  lines  of  python  and  2000  lines  of  C++,  imple¬ 
ments  the  network  API  for  managing  host  and  service- 
related  events,  computes  forwarding  rules  and  resolu¬ 
tion  policies,  manages  SCAFFOLD  router  rule  installa¬ 
tion,  and  monitors  network  load.  While  the  SCAFFOLD 
architecture  is  amenable  to  distributed  control,  using  a 
centralized  scheme  not  only  simplifies  the  implementa¬ 
tion,  but  provides  a  basis  for  exerting  tighter  control  over 
network-service  interaction  and  making  joint  decisions 
on  traffic  engineering  and  service  selection. 

While  essential  to  our  goal  of  incremental  implemen¬ 
tation  and  deployability,  OpenFlow  did  not  always  ful¬ 
fill  our  needs.  The  SCAFFOLD  anycast  primitive  re¬ 
quired  judicious  modification  of  OpenVSwitch  code.  We 
needed  a  way  to  choose  a  specific  rule  out  of  an  equiv¬ 
alent  set  to  select  a  service  instance,  instead  of  always 
choosing  the  highest  priority  rule,  which  OpenFlow  does 
by  default.  Our  solution  reinterprets  the  priority  as  a 
proportional  weight  for  rules  matching  the  same  serv¬ 
icelD.  This  allowed  us  to  implement  weighted  propor¬ 
tional  split  for  resolving  packets  according  to  a  specified 
distribution.  While  non-trivial,  the  new  feature  required 
only  400  lines  of  code.  Note  that  the  OpenFlow  roadmap 
includes  a  proportional  rule  selection  mechanism. 

6.3  Fast  and  Portable  End-Host  Stacks 

In  designing  the  host  stack,  we  sought  to  retain  compat¬ 
ibility  with  the  BSD  sockets  API  to  simplify  porting  of 
applications.  Adding  support  for  SCAFFOLD  sockets  to 
applications  should  not  be  much  more  work  than  adding, 
e.g.,  IPv6  support.  Early  experience  in  porting  applica¬ 
tions  support  this  view,  as  detailed  in  §7. 

The  SCAFFOLD  stack  was  implemented  for  a  Linux 
2.6.34  kernel  and  consists  of  16792  lines  of  C++  code 
shared  between  the  user-space  and  kernel-space  versions. 
Because  SCAFFOLD  blurs  the  layer  boundaries  between 
network  and  transport,  and  because  we  required  the  stack 
to  run  in  both  user  space  and  kernel,  we  had  to  re¬ 
implement  much  functionality.  This  includes  network- 
layer  functionality,  as  well  as  unreliable  datagram  and 
reliable  stream  transport  (i.e.,  UDP  and  TCP  adapted  for 
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SCAFFOLD).  Although  our  two  versions  of  the  stack 
share  most  of  their  logic,  there  are  some  differences  be¬ 
tween  them.  The  user-space  version’s  socket  library  ex¬ 
poses  a  BSD  sockets  API  and  communicates,  via  IPC, 
with  the  SCAFFOLD  stack  running  as  a  user  process. 
The  kernel  version,  on  the  other  hand,  implements  the 
backends  of  the  BSD  socket  system  calls  in  a  kernel 
module,  which  hooks  directly  into  the  SCAFFOLD  stack 
running  as  a  kernel  thread.  Both  the  user  stack  and  kernel 
thread  are  implemented  using  Click  [16].  In  both  modes, 
the  stack  intercepts  SCAFFOLD  packets  by  attaching  it¬ 
self  to  the  network  device. 

In  comparison  to  a  traditional  TCP/IP  stack,  the 
SCAFFOLD  stack  has  a  tighter  host/network  integra¬ 
tion.  BSD  socket  calls,  like  bind  and  close,  trigger 
network  interaction  (e.g.,  service  registration).  Hooking 
such  interaction  into  socket  calls  makes  it  transparent  to 
applications,  and  makes  porting  easier.  Further,  seam¬ 
less  handling  of  failover,  migration,  and  mobility  require 
a  decoupling  of  connection  management  from  transport 
protocols,  along  with  new  connection  states,  and  an  API 
call  to  initiate  failover/migration.  The  BSD  sockets  API 
supports  such  extensions  using  its  ioctl  interface,  but 
their  usage  is  optional  in  applications. 

Our  end-host  stack  currently  lacks  certain  features  and 
performance  optimizations,  such  as  window  scaling  for 
TCP.  We  expect  a  future  “production  quality’’  kernel- 
only  implementation  to  reuse  much  of  the  existing  TCP 
code  in,  e.g.,  the  Linux  kernel. 

7  Evaluation 

We  aim  to  show  that  our  architecture  design  is  both  prac¬ 
tical  and  functional  in  terms  of:  (i)  portability — namely 
that  SCAFFOLD  support  can  be  added  to  applications 
with  relative  ease;  (ii)  performance — that  our  stack  and 
routers  perform  reasonably  and  that  there  are  no  inher¬ 
ent  limitations  to  our  design;  and  fiii)  dynamism — that 
both  planned  and  unplanned  dynamism  (e.g.,  failures, 
migration,  and  maintenance)  can  be  handled  gracefully 
and  without  unnecessary  disruption  to  services. 

To  this  end,  we  start  by  reviewing  the  effort  needed  to 
bring  SCAFFOLD  support  to  applications.  We  then  con¬ 
tinue  with  describing  our  experimental  setup,  followed 
by  micro-benchmarks  that  show  the  performance  of  our 
end  host  stack.  We  then  move  on  to  a  number  of  illustra¬ 
tive  experimental  scenarios  that  highlights  the  dynamism 
of  SCAFFOLD.  Finally,  we  conclude  with  two  case  stud¬ 
ies;  The  first  shows  that  SCAFFOLD  can  support  virtual 
machine  migration  across  broadcast  domains,  something 
not  possible  with  today’s  infrastructure.  The  second  ex¬ 
plores  how  the  abstractions  offered  by  SCAFFOLD  can 
be  used  to  make  memcached,  a  popular  back-end  service, 
simpler  and  more  robust. 


Application 

Version 

Codebase 

Changes 

Iperf 

2.0.0 

5,934 

240 

TFTP 

5.0 

3,452 

90 

PowerDNS 

2.9.17 

36,225 

160 

Wget 

1.12 

87,164 

207 

Elinks  browser 

0.11.7 

115,224 

234 

Mongoose  web  server 

2.10 

8,831 

425 

Memcached  server 

1.4.5 

8,329 

159 

Memcached  client 

0.40 

12,503 

184 

Apache  Benchmark  /  APR 

1.4.2 

55,609 

244 

Table  4:  Applications  currently  ported  to  SCAFFOLD,  as  well  as 
the  size  (in  lines  of  code)  of  the  original  codebase  and  the  extent  of 
changes  needed  for  porting. 

7.1  Application  Portability 

We  have  added  SCAFFOLD  support  to  a  range  of  net¬ 
work  applications  to  demonstrate  the  ease  of  adoption. 
Because  many  network  applications  today  come  with 
support  for  both  IPv4  and  IPv6,  they  already  have  the 
necessary  abstractions  to  simplify  the  addition  of  another 
family.  Hence,  adding  SCAFFOLD  support  typically  in¬ 
volves  adding  a  sockaddr_sf  socket  address  along¬ 
side  the  IPv4  and  IPv6  equivalents.  Further  modifica¬ 
tions  involve  handling  SCAFFOLD  specific  errors  from 
socket  calls,  and  adding  failover/migration  handling  for 
applications  that  need  such  functionality. 

Table  4  gives  an  overview  of  the  applications  we  have 
ported  and  the  number  of  lines  of  code  changed.  These 
numbers  are  higher  than  strictly  necessary;  we  were  not 
attempting  to  be  parsimonious,  and  we  often  added  wrap¬ 
pers  for  common  BSD  socket  calls  to  support  both  user- 
and  kernel-level  versions  of  our  stack.  The  user  level 
redirects  the  calls  to  a  SCAFFOLD  socket  library  instead 
of  the  standard  system  libraries,  and  thus  necessitates 
renaming  these  functions  to  avoid  name  conflicts  (e.g., 
bind  becomes  bind_sf,  and  so  forth).  In  our  experi¬ 
ence,  adding  SCAFFOLD  support  typically  takes  a  few 
hours  to  a  day,  depending  on  application  complexity. 

7.2  Experimental  Setup 

The  test  environment  we  use  for  our  experiments  mod¬ 
els  a  simple  datacenter  setup,  and  consists  of  a  nine-node 
topology  with  up  to  five  hosts,  two  network  routers,  one 
service  router  and  a  network  controller,  as  illustrated  in 
Figure  6.  While  obviously  small  in  scale,  we  use  this 
setup  to  demonstrate  some  of  the  dynamics  one  encoun¬ 
ters  in  real  settings.  All  links  are  switched  GigE.  Each 
node  has  2  quad-core  2.3  GHz  CPUs  and  three  GigE 
ports,  running  Ubuntu  9.04.  Host  kernels  are  patched 
with  support  for  Click  version  1.8.0. 

7.3  Host-stack  and  Router  Performance 

Table  5  shows  the  TCP  performance  of  the  SCAFFOLD 
host-stack  implementation,  both  kernel  and  user-level. 
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Figure  6:  Experimental  setup  for  evaluation. 


Mean 

Stdev 

Stack 

Mbit/s 

Mbit/s 

TCP/IP 

929.8 

5.3 

SCAFFOLD  (kernel) 

596.6 

17.0 

SCAFFOLD  (user) 

110.1 

16.1 

SCAFFOLD  (user  with  tracing) 

82.3 

8.8 

Router 

Kpkts/s 

Kpkts/s 

Service  (Resolution) 

12.99 

0.17 

Network  (Data  forwarding) 

13.25 

1.47 

Table  5:  The  table  shows  a  performance  comparison  of  the 
TCP/IP  stack  compared  to  the  SCAFFOLD  stack’s  reliable  stream 
protocol,  running  in  both  user  and  kernel  space.  The  table  also 
shows  processing  rates  for  the  service  and  network  routers  for 
64  byte  packets. 


in  comparison  to  the  Linux  TCP/IP  stack.  The  num¬ 
bers  were  acquired  while  performing  five  10  second  TCP 
transfers  between  two  hosts  in  our  setup  using  IPerf. 

Although  the  SCAFFOLD  stack  lacks  a  number  of 
TCP  optimizations,  performance  is  within  two-thirds  of 
the  native  TCP/IP  stack  when  running  in  kernel  mode. 
This  performance  degradation  arises  because  our  current 
implementation  lacks  TCP  window  scaling  and  uses  a 
64  KB  window  size.  Therefore,  after  slow  start  and  ad¬ 
ditive  increase,  we  are  limited  by  the  under-sized  receive 
window,  and  a  single  flow  cannot  claim  the  full  band¬ 
width  of  our  links.  This  is  not  fundamental  to  SCAF¬ 
FOLD:  we  are  in  the  process  of  adding  such  optimiza¬ 
tions,  and  this  performance  gap  should  narrow  greatly. 

To  make  sure  a  single  flow  can  claim  the  full  band¬ 
width,  we  introduced  bandwidth  shaping  at  hosts.  Shap¬ 
ing  allows  a  configurable  maximum  rate  of  packets  to 
be  transmitted  and,  therefore,  competing  flows  share  the 
limited  bandwidth  rather  than  claiming  chunks  of  the  un¬ 
used  bandwidth. 

Table  5  also  shows  the  packet  processing  rate  of  our 
service  and  network  routers.  The  multiple-rule  match¬ 
ing  in  service  routers  has  a  slightly  higher  overhead. 
While  included  for  completeness,  these  measurements 
primarily  evaluate  the  performance  of  the  Open  VS  witch 
software  router;  hardware  implementations  would  see 
orders-of-magnitude  improvements. 
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Figure  7:  High  availability  with  two  clients  and  two  servers, 
showing  how  a  client  is  transparently  redirected  to  another  service 
instance  as  failure  happens. 


7.4  High  Availability  with  Failover 

SCAFFOLD’S  handling  of  churn  allows  services  to 
maintain  high  availability  in  the  face  of  failures.  This 
is  illustrated  by  our  experiment  where  we  force  a  server 
process  to  fail  and  ongoing  flows  are  seamlessly  redi¬ 
rected  to  the  remaining  server  instances. 

Figure  7  shows  the  TCP  goodput  of  two  Wget  [38] 
clients  (hosts  1  and  2  in  Figure  6),  being  individ¬ 
ually  served  by  two  server  instances  of  the  same 
mongoose  [22]  Web  service  (hosts  3  and  4).  Band¬ 
width  shaping  limited  the  maximum  rate  to  5  Mbps3. 
The  clients  each  download  a  200  MB  file,  with  client  2 
starting  around  70  seconds  after  client  1.  They  are  ini¬ 
tially  directed  to  one  service  instance  each,  due  to  the 
load  balancing  scheme.  At  the  170  second  mark,  we 
trigger  a  failure  in  one  of  the  server  processes,  causing 
client  2  to  failover  to  the  server  instance  serving  client 
1 .  The  failover  completes  within  a  couple  of  round  trip 
times  (i.e.,  the  time  needed  to  complete  an  RSYN  hand¬ 
shake).  Client  1  finishes  its  request  at  the  500  second 
mark  and  client  2  can  then  utilize  the  full  bandwidth  for 
the  remainder  of  its  request. 

7.5  Load  Balancing  and  Shedding 

To  demonstrate  SCAFFOLD’S  ability  to  scale  a  dis¬ 
tributed  service  using  anycast  resolution  we  ran  an  ex¬ 
periment  representative  of  a  typical  front-end  web  server 
farm  as  shown  in  Figure  8.  A  network  delay  of  100  ms 
is  applied  to  emulate  link  latency  and  improve  visualiza¬ 
tion.  Without  this  delay,  requests  would  complete  too 
quickly  and  provide  little  insight  into  the  request  load 
characteristics  of  the  system.  In  the  experiment,  from 
time  0  to  40  seconds,  2  Wget  clients  issue  3  HTTP  re¬ 
quests  per  second  to  download  a  100KB  file  from  a  web 
service  running  mongoose.  As  the  request  load  in¬ 
creases  on  Server  1,  we  add  additional  servers:  Server  2 
at  the  5  second  mark  and  Server  3  at  the  10  second  mark. 
SCAFFOLD  automatically  balances  requests  across  the 

3  The  spikes  in  throughput  seen  as  clients  initiate  their  requests  is 
due  to  bandwidth  shaping — the  shaper  needs  a  number  of  packets  to 
learn  the  correct  rate  at  which  to  shape  the  traffic. 
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Figure  8:  Replicated  service  support  with  2  clients  and  3  servers 
showing  load-balancing  as  additional  servers  are  added,  request 
shedding  for  planned  maintenance,  and  the  residual  effects  of  lin¬ 
gering  requests  with  draining. 

new  service  instances  as  the  active  request  count  of  the 
three  servers  begins  to  converge.  At  20  seconds,  Server 
3  is  gracefully  shut  down  for  planned  maintenance  by 
closing  its  listening  socket  and  by  invoking  a  system  call 
that  results  in  a  FAIL  messages  being  sent  to  all  of  its  ac¬ 
tive  connections.  This  allows  Server  3  to  quiesce  quickly 
(80%  of  active  connections  shed  in  <  1  second).  The  ac¬ 
tive  connections  are  then  re-resolved  to  the  other  server 
instances  as  seen  by  the  subsequent  increases  in  requests 
at  Servers  1  and  2.  In  contrast,  the  current  practice  of 
server  draining  for  maintenance,  which  is  shown  starting 
at  the  30  second  mark  on  Server  2,  delays  the  server  shut¬ 
down  time  by  the  longest  lived  connection  which  finishes 
at  the  53  second  mark. 

7.6  VM  Migration 

Today,  it  is  not  possible  to  seamlessly  migrate  virtual 
machines  across  layer-2  subnets,  but  SCAFFOLD  en¬ 
ables  such  functionality  with  its  in-band  signaling.  We 
performed  a  proof-of-concept  experiment  using  Virtual- 
Box  [34],  in  which  we  migrated  guest  VMs  across  host 
machines  on  different  network  segments.  The  connec¬ 
tions  were  maintained  across  migration,  with  a  transfer 
pause  ranging  from  0.5  to  2.5  seconds.  This  delay  is 
primarily  due  to  our  need  to  externally  signal  the  VM 
after  migration  occurs  so  that  it  cycles  its  network  inter¬ 
face  to  get  assigned  a  new  SCAFFOLD  address.  Virtu- 
alBox,  like  most  VMs,  uses  gratuitous  ARP  for  layer-2 
migration;  going  forward,  we  will  modify  the  VM  mi¬ 
grate  code  to  signal  its  kernel  of  an  “interface  up”  event 
instead. 

7.7  Dynamic  Memcached 

Memcached  is  a  popular  backend  service  that  provides 
a  distributed  hash  table  to  clients  (typically  web  servers) 
with  get/set  key-value  semantics.  To  use  memcached, 
clients  need  to  maintain  a  list  of  memcached  servers  that 
make  up  the  hash-table  storage.  They  use  this  list  to  de¬ 
cide  which  server  is  responsible  for  a  certain  partition  of 


Figure  9;  Memcached  Server  Throughput.  Server  2  joins  the  net¬ 
work  after  around  15  seconds;  server  1  leaves  after  30  seconds.  In 
both  cases,  the  network  transparently  redistributes  the  data  parti¬ 
tions  (named  by  unique  servicelDs)  over  the  available  servers. 

the  keyspace,  and  hence  should  be  contacted  for  particu¬ 
lar  keys.  Memcached  itself  does  not  provide  any  means 
to  keep  this  server  list  up-to-date,  and  many  deployments 
perform  manual  administration. 

With  SCAFFOLD,  the  server  selection  and  keyspace 
partitioning  can  be  made  more  dynamic  by  moving  them 
from  clients  to  the  network,  and  delegating  their  man¬ 
agement  to  the  service  router  and  controller.  To  enable 
this  memcached  dynamism,  we  name  partitions  by  serv¬ 
icelDs,  and  clients  issue  requests  to  partitions  instead  of 
specific  servers.  Hence,  a  server  responsible  for  a  spe¬ 
cific  partition  is  resolved  via  the  service  router  when  a 
request  is  made.  When  a  new  memcached  server  reg¬ 
isters  with  the  network,  the  controller  reassigns  some 
partitions  from  existing  servers  to  the  new  one  (like  Dy¬ 
namo’s  tokens  [6]).  When  an  instance  is  unregistered  (or 
overloaded),  the  controller  reassigns  all  (or  some)  of  its 
partitions  simply  by  changing  rules  in  the  service  routers. 

Figure  9  demonstrates  the  behavior  of  memcached  on 
SCAFFOLD  with  three  clients  and  two  servers.  In  the 
experiment,  three  clients  issue  set  requests  (each  with  a 
data  object  of  1024  bytes)  with  random  keys  at  the  to¬ 
tal  rate  of  14000  requests  per  second  on  average.  Re¬ 
quests  are  sent  using  SCAFFOLD’S  unbound  datagram. 
In  the  beginning,  only  one  memcached  server  is  operat¬ 
ing.  Around  the  15  second  mark,  a  second  server  comes 
online,  while  the  first  server  leaves  the  network  after 
30  s.  Figure  9  illustrates  that,  with  the  network  reassign¬ 
ing  partitions  following  server  churn,  the  system  reacts 
quickly  to  dynamism  and  each  server  receives  its  appro¬ 
priate  fraction  of  requests. 

8  Conclusions 

Accessing  large,  distributed,  replicated  services  is  a 
hallmark  of  today’s  Internet;  yet,  the  underlying  net¬ 
work  does  not  support  these  applications  well.  As  we 
have  outlined  in  this  paper,  the  central  challenges  of 
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service-centric  networking  are  replication  and  dynamism 
that  span  across  the  classic  problems  in  networking — 
naming,  addressing,  and  routing.  SCAFFOLD  takes 
a  “clean-slate”  approach  to  the  problem  by  supporting 
flow-based  anycast  with  service  instances  and  rethink¬ 
ing  the  division  of  labor  between  end-hosts  and  the  net¬ 
work.  We  believe  that  SCAFFOLD  is  a  promising  ap¬ 
proach  that  can  make  future  services  easier  to  design,  im¬ 
plement,  and  manage,  as  evidenced  by  our  prototype  and 
the  set  of  applications  we  have  ported  to  SCAFFOLD. 
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